Using latent semantic indexing for morph-based spoken document retrieval

نویسندگان

  • Ville T. Turunen
  • Mikko Kurimo
چکیده

Previously, phone-based and word-based approaches have been used for spoken document retrieval. The former suffers from high error rates and the latter from limited vocabulary of the recognizer. Our method relies on unlimited vocabulary continuous speech recognizer that uses morpheme-like units discovered in an unsupervised manner. The morpheme-like units, or “morphs” for short, have been successfully used also as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features

Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...

متن کامل

Improved Chinese Spoken D with Hybrid Modeling and D Feature

Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...

متن کامل

Indexing Audio Documents by using Latent Semantic Analysis and SOM

This paper describes an important application for state-of-art automatic speech recognition , natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection an...

متن کامل

Fusion of Semantic and Acoustic Approaches for Spoken Document Retrieval

Most spoken document retrieval systems use the words derived from a large vocabulary speech recognizer as the internal representation for indexing the document. However, the use of recognition transcripts inherently limits the performance of the system since the size of the dictionary restricts the number of queries for which matches can be found. In this paper we present a new approach to this...

متن کامل

Thematic indexing of spoken documents by using self-organizing maps

A method is presented to provide a useful searchable index for spoken audio documents. The task diiers from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006